首页> 外文OA文献 >Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

【2h】

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

机译：端到端可训练任务导向神经网络中的迭代策略学习对话模型

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a deep reinforcement learning (RL) framework foriterative dialog policy optimization in end-to-end task-oriented dialogsystems. Popular approaches in learning dialog policy with RL include letting adialog agent to learn against a user simulator. Building a reliable usersimulator, however, is not trivial, often as difficult as building a gooddialog agent. We address this challenge by jointly optimizing the dialog agentand the user simulator with deep RL by simulating dialogs between the twoagents. We first bootstrap a basic dialog agent and a basic user simulator bylearning directly from dialog corpora with supervised training. We then improvethem further by letting the two agents to conduct task-oriented dialogs anditeratively optimizing their policies with deep RL. Both the dialog agent andthe user simulator are designed with neural network models that can be trainedend-to-end. Our experiment results show that the proposed method leads topromising improvements on task success rate and total task reward comparing tosupervised training and single-agent RL training baseline models.

机译：在本文中，我们提出了一种深度增强学习（RL）框架，用于在端到端任务导向的对话系统中进行迭代对话策略优化。使用RL学习对话策略的流行方法包括让模拟代理根据用户模拟器进行学习。但是，构建可靠的用户模拟器并非易事，通常与构建良好的对话代理一样困难。我们通过模拟两个代理之间的对话框，通过深度RL联合优化对话框代理和用户模拟器来解决这一挑战。我们首先通过在监督下直接从对话框语料库中学习，来引导基本的对话框代理和基本的用户模拟器。然后，我们让两个代理进行面向任务的对话，并通过深度RL迭代优化其策略，从而进一步改进它们。对话代理和用户模拟器都设计有可以端到端训练的神经网络模型。我们的实验结果表明，与监督训练和单代理RL训练基线模型相比，该方法可以最大程度地提高任务成功率和总任务奖励。

著录项

作者
Liu, Bing; Lane, Ian;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Adaptive Iterative Learning Control for Subway Trains Using Multiple-Point-Mass Dynamic Model Under Speed Constraint [J] . Liu Genfeng, Hou Zhongsheng IEEE Transactions on Intelligent Transportation Systems . 2021 ,第3期

机译：速度约束下多点质量动态模型的地铁训练自适应迭代学习控制
2. Position calculation models by neural computing and online learning methods for high-speed train [J] . Chen Dewang, Han Xiaojie, Cheng Ruijun, Neural computing & applications . 2016 ,第6期

机译：基于神经计算和在线学习方法的高速列车位置计算模型
3. Neural Network Models Trained with Kalman Learning Rule for Reservoir Inflow Forecasting [J] . M. J. DIAMANTOPOULOU, P. E. GEORGIOU, D. M. PAPAMICHAIL WSEAS Transactions on Environment and Development . 2006 ,第2期

机译：卡尔曼学习规则训练的神经网络模型用于水库入库量预测
4. Iterative policy learning in end-to-end trainable task-oriented neural dialog models [C] . Bing Liu, Ian Lane 2017 IEEE Automatic Speech Recognition and Understanding Workshop . 2017

机译：端到端可训练的面向任务的神经对话模型中的迭代策略学习
5. Learning Task-Oriented Dialog with Neural Network Methods =基于神经网络的任务型对话学习 [D] . Liu, Bing. 2018

机译：Learning Task-Oriented Dialog with Neural Network Methods =基于神经网络的任务型对话学习
6. UKF-based closed loop iterative learning control of epileptiform wave in a neural mass model [O] . Bonan Shan, Jiang Wang, Bin Deng, 2015

机译：神经质量模型中基于UKF的癫痫波闭环迭代学习控制
7. An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog [O] . Liu, Bing, Lane, Ian 2017

机译：具有信念跟踪的端到端可训练神经网络模型面向任务的对话框

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

摘要

著录项

相似文献

相关主题

期刊订阅